17 research outputs found

    Client-specific Property Inference against Secure Aggregation in Federated Learning

    Full text link
    Federated learning has become a widely used paradigm for collaboratively training a common model among different participants with the help of a central server that coordinates the training. Although only the model parameters or other model updates are exchanged during the federated training instead of the participant's data, many attacks have shown that it is still possible to infer sensitive information such as membership, property, or outright reconstruction of participant data. Although differential privacy is considered an effective solution to protect against privacy attacks, it is also criticized for its negative effect on utility. Another possible defense is to use secure aggregation which allows the server to only access the aggregated update instead of each individual one, and it is often more appealing because it does not degrade model quality. However, combining only the aggregated updates, which are generated by a different composition of clients in every round, may still allow the inference of some client-specific information. In this paper, we show that simple linear models can effectively capture client-specific properties only from the aggregated model updates due to the linearity of aggregation. We formulate an optimization problem across different rounds in order to infer a tested property of every client from the output of the linear models, for example, whether they have a specific sample in their training data (membership inference) or whether they misbehave and attempt to degrade the performance of the common model by poisoning attacks. Our reconstruction technique is completely passive and undetectable. We demonstrate the efficacy of our approach on several scenarios which shows that secure aggregation provides very limited privacy guarantees in practice. The source code will be released upon publication.Comment: Workshop on Privacy in the Electronic Society (WPES'23), held in conjunction with CCS'2

    Private Set Generation with Discriminative Information

    Full text link
    Differentially private data generation techniques have become a promising solution to the data privacy challenge -- it enables sharing of data while complying with rigorous privacy guarantees, which is essential for scientific progress in sensitive domains. Unfortunately, restricted by the inherent complexity of modeling high-dimensional distributions, existing private generative models are struggling with the utility of synthetic samples. In contrast to existing works that aim at fitting the complete data distribution, we directly optimize for a small set of samples that are representative of the distribution under the supervision of discriminative information from downstream tasks, which is generally an easier task and more suitable for private training. Our work provides an alternative view for differentially private generation of high-dimensional data and introduces a simple yet effective method that greatly improves the sample utility of state-of-the-art approaches.Comment: NeurIPS 2022, 19 page

    Private and Collaborative Kaplan-Meier Estimators

    Full text link
    Kaplan-Meier estimators capture the survival behavior of a cohort. They are one of the key statistics in survival analysis. As with any estimator, they become more accurate in presence of larger datasets. This motivates multiple data holders to share their data in order to calculate a more accurate Kaplan-Meier estimator. However, these survival datasets often contain sensitive information of individuals and it is the responsibility of the data holders to protect their data, thus a naive sharing of data is often not viable. In this work, we propose two novel differentially private schemes that are facilitated by our novel synthetic dataset generation method. Based on these scheme we propose various paths that allow a joint estimation of the Kaplan-Meier curves with strict privacy guarantees. Our contribution includes a taxonomy of methods for this task and an extensive experimental exploration and evaluation based on this structure. We show that we can construct a joint, global Kaplan-Meier estimator which satisfies very tight privacy guarantees and with no statistically-significant utility loss compared to the non-private centralized setting

    Constrained Differentially Private Federated Learning for Low-bandwidth Devices

    Full text link
    Federated learning becomes a prominent approach when different entities want to learn collaboratively a common model without sharing their training data. However, Federated learning has two main drawbacks. First, it is quite bandwidth inefficient as it involves a lot of message exchanges between the aggregating server and the participating entities. This bandwidth and corresponding processing costs could be prohibitive if the participating entities are, for example, mobile devices. Furthermore, although federated learning improves privacy by not sharing data, recent attacks have shown that it still leaks information about the training data. This paper presents a novel privacy-preserving federated learning scheme. The proposed scheme provides theoretical privacy guarantees, as it is based on Differential Privacy. Furthermore, it optimizes the model accuracy by constraining the model learning phase on few selected weights. Finally, as shown experimentally, it reduces the upstream and downstream bandwidth by up to 99.9% compared to standard federated learning, making it practical for mobile systems.Comment: arXiv admin note: text overlap with arXiv:2011.0557

    Fed-GLOSS-DP: Federated, Global Learning using Synthetic Sets with Record Level Differential Privacy

    Full text link
    This work proposes Fed-GLOSS-DP, a novel privacy-preserving approach for federated learning. Unlike previous linear point-wise gradient-sharing schemes, such as FedAvg, our formulation enables a type of global optimization by leveraging synthetic samples received from clients. These synthetic samples, serving as loss surrogates, approximate local loss landscapes by simulating the utility of real images within a local region. We additionally introduce an approach to measure effective approximation regions reflecting the quality of the approximation. Therefore, the server can recover the global loss landscape and comprehensively optimize the model. Moreover, motivated by the emerging privacy concerns, we demonstrate that our approach seamlessly works with record-level differential privacy (DP), granting theoretical privacy guarantees for every data record on the clients. Extensive results validate the efficacy of our formulation on various datasets with highly skewed distributions. Our method consistently improves over the baselines, especially considering highly skewed distributions and noisy gradients due to DP. The source code will be released upon publication

    Federated Learning in Adversarial Settings

    Get PDF
    Federated Learning enables entities to collaboratively learn a shared prediction model while keeping their training data locally. It prevents data collection and aggregation and, therefore, mitigates the associated privacy risks. However, it still remains vulnerable to various security attacks where malicious participants aim at degrading the generated model, inserting backdoors, or inferring other participants' training data. This paper presents a new federated learning scheme that provides different trade-offs between robustness, privacy, bandwidth efficiency, and model accuracy. Our scheme uses biased quantization of model updates and hence is bandwidth efficient. It is also robust against state-of-the-art backdoor as well as model degradation attacks even when a large proportion of the participant nodes are malicious. We propose a practical differentially private extension of this scheme which protects the whole dataset of participating entities. We show that this extension performs as efficiently as the non-private but robust scheme, even with stringent privacy requirements but are less robust against model degradation and backdoor attacks. This suggests a possible fundamental trade-off between Differential Privacy and robustness

    Privacy-Preserving and Bandwidth-Efficient Federated Learning: An Application to In-Hospital Mortality Prediction

    Get PDF
    International audienceMachine Learning, and in particular Federated Machine Learning, opens new perspectives in terms of medical research and patient care. Although Federated Machine Learning improves over centralized Machine Learning in terms of privacy, it does not provide provable privacy guarantees. Furthermore, Federated Machine Learning is quite expensive in term of bandwidth consumption as it requires participant nodes to regularly exchange large updates. This paper proposes a bandwidth-efficient privacy-preserving Federated Learning that provides theoretical privacy guarantees based on Differential Privacy. We experimentally evaluate our proposal for in-hospital mortality prediction using a real dataset, containing Electronic Health Records of about one million patients. Our results suggest that strong and provable patient-level privacy can be enforced at the expense of only a moderate loss of prediction accuracy

    Compression Boosts Differentially Private Federated Learning

    Get PDF
    International audienceFederated Learning allows distributed entities to train a common model collaboratively without sharing their own data. Although it prevents data collection and aggregation by exchanging only parameter updates, it remains vulnerable to various inference and reconstruction attacks where a malicious entity can learn private information about the participants’ training data from the captured gradients. Differential Privacy is used to obtain theoretically sound privacy guarantees against such inference attacks by noising the exchanged update vectors. However, the added noise isproportional to the model size which can be very large with modern neural networks. This can result in poor model quality. In this paper, compressive sensing is used to reduce the model size and hence increase model quality without sacrificing privacy. We show experimentally, using 2 datasets, that our privacy-preserving proposal can reduce the communication costs by up to 95% with only a negligible performance penalty compared to traditional non-private federated learning schemes

    Apprentissage fédéré avec confidentialité différentielle pour les environnements contraints en bande passante et énergie

    No full text
    In Machine Learning, several entities may want to collaborate in order to improve their local model accuracy. In traditional machine learning, such collaboration requires to first store all entities’ data on a centralized server before training the model on it. Such data centralization might be problematic when the data are sensitive and data privacy is required. Instead of sharing the training data, Federated Learning shares the model parameters between a server, which plays the role of aggregator, and the participating entities. More specifically, the server sends at each round the global model to some participants (downstream). These participants then update the received model with their local data and sends back the updated gradients’ vector to the server (upstream). The server then aggregates all the participants’ updates to obtain the new global model. This operation is repeated until the global model converges. Although Federated Learning improves privacy, it is not perfect. In fact, sharing gradients computed by individual parties can leak information about their private training data. Several recent attacks have demonstrated that a sufficiently skilled adversary, who can capture the model updates (gradients) sent by individual parties, can infer whether a specific record or a group property is present in the dataset of a specific party. Moreover, complete training samples can also be reconstructed purely from the captured gradients. Furthermore, Federated Learning is not only vulnerable to privacy attacks, it is also vulnerable to poisoning attacks which can drastically decrease the model accuracy. Finally, Federated Learning incurs large communication costs during upstream/downstream exchanges between the server and the parties. This can be problematic for applications based on bandwidth and energy-constrained devices as it is the case for mobile systems, for instance. In this thesis, we first propose three bandwidth efficient schemes to reduce the bandwidth costs up to 99.9 %. We then propose differentially private extensions of these schemes that outperform standard privacy-preserving Federated Learning schemes. Finally, we investigate the robustness of our schemes against security attacks and discuss a possible privacy-robustness tradeoff which may spur further research.En apprentissage automatique, plusieurs entités peuvent vouloir collaborer afin d'améliorer la précision de leur modèle local. Dans l'apprentissage automatique traditionnel, une telle collaboration nécessite de stocker d'abord les données de toutes les entités sur un serveur centralisé avant d'entraîner le modèle sur ces données. Cette centralisation des données peut s'avérer problématique lorsque les données sont sensibles et que leur confidentialité est requise. Au lieu de partager les données d'entraînement, l'apprentissage fédéré partage les paramètres du modèle entre un serveur, qui joue le rôle d'agrégateur, et les entités participantes. Plus précisément, le serveur envoie à chaque tour le modèle global à certains participants (en aval). Ces participants mettent ensuite à jour le modèle reçu avec leurs données locales et renvoient le vecteur des gradients mis à jour au serveur (en amont). Le serveur agrège alors toutes les mises à jour des participants pour obtenir le nouveau modèle global. Cette opération est répétée jusqu'à ce que le modèle global converge. Bien que l'apprentissage fédéré améliore la confidentialité, il n'est pas parfait. En effet, le partage des gradients calculés par les parties individuelles peut entraîner une fuite d'informations sur leurs données d'entraînement privées. Plusieurs attaques récentes ont démontré qu'un adversaire suffisamment habile, qui peut capturer les mises à jour du modèle (gradients) envoyées par les parties individuelles, peut déduire si une donnée spécifique ou une propriété de groupe est présent dans l'ensemble de données d'un participant. De plus, des échantillons d'entraînement complets peuvent également être reconstruits uniquement à partir des gradients capturés. En outre, l'apprentissage fédéré n'est pas seulement vulnérable aux attaques contre la vie privée, il est également vulnérable aux attaques par empoisonnement qui peuvent réduire considérablement la précision du modèle. Enfin, l'apprentissage fédéré entraîne des coûts de communication importants lors des échanges amont/aval entre le serveur et les parties. Cela peut être problématique pour les applications basées sur des dispositifs à bande passante et à énergie limitée comme c'est le cas pour les systèmes mobiles, par exemple. Dans cette thèse, nous proposons d'abord trois schémas efficaces en termes d'optimisation de la bande passante pour réduire les coûts jusqu'à 99,9 %. Ensuite, nous proposons une extension basée sur la confidentialité différentielle de nos schémas optimisés avec des garanties théoriques et qui surpassent en termes de précision le schéma standard d'apprentissage fédéré protégé avec la confidentialité différentielle. Enfin, nous étudions la robustesse de nos schémas contre les attaques de sécurité et nous discutons d'un compromis possible entre la confidentialité et la robustesse, ce qui pourrait ouvrir de nouvelles perspectives de recherches futures

    Apprentissage fédéré avec confidentialité différentielle pour les environnements contraints en bande passante et énergie

    No full text
    In Machine Learning, several entities may want to collaborate in order to improve their local model accuracy. In traditional machine learning, such collaboration requires to first store all entities’ data on a centralized server before training the model on it. Such data centralization might be problematic when the data are sensitive and data privacy is required. Instead of sharing the training data, Federated Learning shares the model parameters between a server, which plays the role of aggregator, and the participating entities. More specifically, the server sends at each round the global model to some participants (downstream). These participants then update the received model with their local data and sends back the updated gradients’ vector to the server (upstream). The server then aggregates all the participants’ updates to obtain the new global model. This operation is repeated until the global model converges. Although Federated Learning improves privacy, it is not perfect. In fact, sharing gradients computed by individual parties can leak information about their private training data. Several recent attacks have demonstrated that a sufficiently skilled adversary, who can capture the model updates (gradients) sent by individual parties, can infer whether a specific record or a group property is present in the dataset of a specific party. Moreover, complete training samples can also be reconstructed purely from the captured gradients. Furthermore, Federated Learning is not only vulnerable to privacy attacks, it is also vulnerable to poisoning attacks which can drastically decrease the model accuracy. Finally, Federated Learning incurs large communication costs during upstream/downstream exchanges between the server and the parties. This can be problematic for applications based on bandwidth and energy-constrained devices as it is the case for mobile systems, for instance. In this thesis, we first propose three bandwidth efficient schemes to reduce the bandwidth costs up to 99.9 %. We then propose differentially private extensions of these schemes that outperform standard privacy-preserving Federated Learning schemes. Finally, we investigate the robustness of our schemes against security attacks and discuss a possible privacy-robustness tradeoff which may spur further research.En apprentissage automatique, plusieurs entités peuvent vouloir collaborer afin d'améliorer la précision de leur modèle local. Dans l'apprentissage automatique traditionnel, une telle collaboration nécessite de stocker d'abord les données de toutes les entités sur un serveur centralisé avant d'entraîner le modèle sur ces données. Cette centralisation des données peut s'avérer problématique lorsque les données sont sensibles et que leur confidentialité est requise. Au lieu de partager les données d'entraînement, l'apprentissage fédéré partage les paramètres du modèle entre un serveur, qui joue le rôle d'agrégateur, et les entités participantes. Plus précisément, le serveur envoie à chaque tour le modèle global à certains participants (en aval). Ces participants mettent ensuite à jour le modèle reçu avec leurs données locales et renvoient le vecteur des gradients mis à jour au serveur (en amont). Le serveur agrège alors toutes les mises à jour des participants pour obtenir le nouveau modèle global. Cette opération est répétée jusqu'à ce que le modèle global converge. Bien que l'apprentissage fédéré améliore la confidentialité, il n'est pas parfait. En effet, le partage des gradients calculés par les parties individuelles peut entraîner une fuite d'informations sur leurs données d'entraînement privées. Plusieurs attaques récentes ont démontré qu'un adversaire suffisamment habile, qui peut capturer les mises à jour du modèle (gradients) envoyées par les parties individuelles, peut déduire si une donnée spécifique ou une propriété de groupe est présent dans l'ensemble de données d'un participant. De plus, des échantillons d'entraînement complets peuvent également être reconstruits uniquement à partir des gradients capturés. En outre, l'apprentissage fédéré n'est pas seulement vulnérable aux attaques contre la vie privée, il est également vulnérable aux attaques par empoisonnement qui peuvent réduire considérablement la précision du modèle. Enfin, l'apprentissage fédéré entraîne des coûts de communication importants lors des échanges amont/aval entre le serveur et les parties. Cela peut être problématique pour les applications basées sur des dispositifs à bande passante et à énergie limitée comme c'est le cas pour les systèmes mobiles, par exemple. Dans cette thèse, nous proposons d'abord trois schémas efficaces en termes d'optimisation de la bande passante pour réduire les coûts jusqu'à 99,9 %. Ensuite, nous proposons une extension basée sur la confidentialité différentielle de nos schémas optimisés avec des garanties théoriques et qui surpassent en termes de précision le schéma standard d'apprentissage fédéré protégé avec la confidentialité différentielle. Enfin, nous étudions la robustesse de nos schémas contre les attaques de sécurité et nous discutons d'un compromis possible entre la confidentialité et la robustesse, ce qui pourrait ouvrir de nouvelles perspectives de recherches futures
    corecore